OpenCluster: A Flexible Distributed Computing Framework for Astronomical Data Processing

نویسندگان

  • Shoulin Wei
  • Feng Wang
  • Hui Deng
  • Cuiyin Liu
  • Wei Dai
  • Bo Liang
  • Ying Mei
  • Congming Shi
  • Yingbo Liu
  • Jingping Wu
چکیده

The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developing high-performance processing pipelines of astronomical big data. We first detail the OpenCluster design principles and implementations and present the APIs facilitated by the framework. We then demonstrate a case in which OpenCluster is used to resolve complex data processing problems for developing a pipeline for the Mingantu Ultrawide Spectral Radioheliograph. Finally, we present our OpenCluster performance evaluation. Overall, OpenCluster provides not only high fault tolerance and simple programming interfaces, but also a flexible means of scaling up the number of interacting entities. OpenCluster thereby provides an easily integrated distributed computing framework for quickly developing a high-performance data processing system of astronomical telescopes and for significantly reducing software development expenses.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corral Framework: Trustworthy and Fully Functional Data Intensive Parallel Astronomical Pipelines

Data processing pipelines are one of most common astronomical software. This kind of programs are chains of processes that transform raw data into valuable information. In this work a Python framework for astronomical pipeline generation is presented. It features a design pattern (Model-View-Controller) on top of a SQL Relational Database capable of handling custom data models, processing stage...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Architecture of processing and analysis system for big astronomical data

This work explores the use of big data technologies deployed in the cloud for processing of astronomical data. We have applied Hadoop and Spark to the task of co-adding astronomical images. We compared the overhead and execution time of these frameworks. We conclude that performance of both frameworks is generally on par. The Spark API is more flexible, which allows one to easily construct astr...

متن کامل

Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization case study

Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualizati...

متن کامل

An Efficient Resource Allocation for Processing Healthcare Data in the Cloud Computing Environment

Nowadays, processing large-media healthcare data in the cloud has become an effective way of satisfying the medical userschr('39') QoS (quality of service) demands. Providing healthcare for the community is a complex activity that relies heavily on information processing. Such processing can be very costly for organizations. However, processing healthcare data in cloud has become an effective s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1701.04907  شماره 

صفحات  -

تاریخ انتشار 2017